Goto

Collaborating Authors

 binding energy



FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction

Cremer, Julian, Le, Tuan, Ghahremanpour, Mohammad M., Sługocka, Emilia, Menezes, Filipe, Clevert, Djork-Arné

arXiv.org Artificial Intelligence

We present FLOWR:root, an equivariant flow-matching model for pocket-aware 3D ligand generation with joint binding affinity prediction and confidence estimation. The model supports de novo generation, pharmacophore-conditional sampling, fragment elaboration, and multi-endpoint affinity prediction (pIC50, pKi, pKd, pEC50). Training combines large-scale ligand libraries with mixed-fidelity protein-ligand complexes, followed by refinement on curated co-crystal datasets and parameter-efficient finetuning for project-specific adaptation. FLOWR:root achieves state-of-the-art performance in unconditional 3D molecule generation and pocket-conditional ligand design, producing geometrically realistic, low-strain structures. The integrated affinity prediction module demonstrates superior accuracy on the SPINDR test set and outperforms recent models on the Schrodinger FEP+/OpenFE benchmark with substantial speed advantages. As a foundation model, FLOWR:root requires finetuning on project-specific datasets to account for unseen structure-activity landscapes, yielding strong correlation with experimental data. Joint generation and affinity prediction enable inference-time scaling through importance sampling, steering molecular design toward higher-affinity compounds. Case studies validate this: selective CK2$α$ ligand generation against CLK3 shows significant correlation between predicted and quantum-mechanical binding energies, while ER$α$, TYK2 and BACE1 scaffold elaboration demonstrates strong agreement with QM calculations. By integrating structure-aware generation, affinity estimation, and property-guided sampling, FLOWR:root provides a comprehensive foundation for structure-based drug design spanning hit identification through lead optimization.


Fast and Interpretable Machine Learning Modelling of Atmospheric Molecular Clusters

Seppäläinen, Lauri, Kubečka, Jakub, Elm, Jonas, Puolamäki, Kai

arXiv.org Artificial Intelligence

Understanding how atmospheric molecular clusters form and grow is key to resolving one of the biggest uncertainties in climate modelling: the formation of new aerosol particles. While quantum chemistry offers accurate insights into these early-stage clusters, its steep computational costs limit large-scale exploration. In this work, we present a fast, interpretable, and surprisingly powerful alternative: $k$-nearest neighbour ($k$-NN) regression model. By leveraging chemically informed distance metrics, including a kernel-induced metric and one learned via metric learning for kernel regression (MLKR), we show that simple $k$-NN models can rival more complex kernel ridge regression (KRR) models in accuracy, while reducing computational time by orders of magnitude. We perform this comparison with the well-established Faber-Christensen-Huang-Lilienfeld (FCHL19) molecular descriptor, but other descriptors (e.g., FCHL18, MBDF, and CM) can be shown to have similar performance. Applied to both simple organic molecules in the QM9 benchmark set and large datasets of atmospheric molecular clusters (sulphuric acid-water and sulphuric-multibase -base systems), our $k$-NN models achieve near-chemical accuracy, scale seamlessly to datasets with over 250,000 entries, and even appears to extrapolate to larger unseen clusters with minimal error (often nearing 1 kcal/mol). With built-in interpretability and straightforward uncertainty estimation, this work positions $k$-NN as a potent tool for accelerating discovery in atmospheric chemistry and beyond.


In silico study on the cytotoxicity against Hela cancer cells of xanthones bioactive compounds from Garcinia cowa: QSAR based on Graph Deep Learning, Network Pharmacology, and Molecular Docking

Son, Nguyen Manh, Vang, Pham Huu, Dung, Nguyen Thi, Thao, Nguyen Manh Ha. Ta Thi, Thuy, Tran Thi Thu, Giang, Phan Minh

arXiv.org Artificial Intelligence

Institute of Natural Products Chemistry, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Nighiado, Cau Giay, Hanoi, Vietnam Abstract: Cancer is recognized as a complex group of diseases, contributing to the highest global mortality rates, with increasing prevalence and a trend toward affecting younger populations. It is characterized by uncontrolled proliferation of abnormal cells, invasion of adjacent tissues, and metastasis to distant organs. Garcinia cowa, a traditional medicinal plant widely used in Southeast Asia, including Vietnam, is employed to treat fever, cough, indigestion, as a laxative, and for parasitic diseases. Numerous xanthone compounds isolated from this species exhibit a broad spectrum of biological activities, with some showing promise as anti-cancer and antimalarial agents. Network pharmacology analysis successfully identified key bioactive compounds Rubraxanthone, Garcinone D, Norcowanin, Cowanol, and Cowaxanthone--alongside their primary protein targets (TNF, CTNNB1, SRC, NFKB1, and MTOR), providing critical insights into the molecular mechanisms underlying their anti-cancer effects. The Graph Attention Network algorithm demonstrated superior predictive performance, achieving an R of 0.98 and an RMSE of 0.02 after data augmentation, highlighting its accuracy in predicting pIC50 values for xanthone-based compounds. Additionally, molecular docking revealed MTOR as a potential target for inducing cytotoxicity in HeLa cancer cells from Garcinia cowa. Keywords: Garcinia cowa, Hela, Network pharmacology, Graph neural network, Molecular docking I. Introduction Cancer is a complex group of diseases and one of the leading causes of mortality worldwide, characterized by the uncontrolled proliferation of abnormal cells, the ability to invade adjacent tissues, and metastasis to distant organs in the body [1, 2].



Quantum Neural Network applications to Protein Binding Affinity Predictions

Teixeira, Erico Souza, Fernandes, Lucas Barros, Inácio, Yara Rodrigues

arXiv.org Artificial Intelligence

Binding energy is a fundamental thermodynamic property that governs molecular interactions, playing a crucial role in fields such as healthcare and the natural sciences. It is particularly relevant in drug development, vaccine design, and other biomedical applications. Over the years, various methods have been developed to estimate protein binding energy, ranging from experimental techniques to computational approaches, with machine learning making significant contributions to this field. Although classical computing has demonstrated strong results in constructing predictive models, the variation of quantum computing for machine learning has emerged as a promising alternative. Quantum neural networks (QNNs) have gained traction as a research focus, raising the question of their potential advantages in predicting binding energies. To investigate this potential, this study explored the feasibility of QNNs for this task by proposing thirty variations of multilayer perceptron-based quantum neural networks. These variations span three distinct architectures, each incorporating ten different quantum circuits to configure their quantum layers. The performance of these quantum models was compared with that of a state-of-the-art classical multilayer perceptron-based artificial neural network, evaluating both accuracy and training time. A primary dataset was used for training, while two additional datasets containing entirely unseen samples were employed for testing. Results indicate that the quantum models achieved approximately 20% higher accuracy on one unseen dataset, although their accuracy was lower on the other datasets. Notably, quantum models exhibited training times several orders of magnitude shorter than their classical counterparts, highlighting their potential for efficient protein binding energy prediction.


A neural network machine-learning approach for characterising hydrogen trapping parameters from TDS experiments

Marrani, N., Hageman, T., Martínez-Pañeda, E.

arXiv.org Artificial Intelligence

The hydrogen trapping behaviour of metallic alloys is generally characterised using Thermal Desorption Spectroscopy (TDS). However, as an indirect method, extracting key parameters (trap binding energies and densities) remains a significant challenge. To address these limitations, this work introduces a machine learning-based scheme for parameter identification from TDS spectra. A multi-Neural Network (NN) model is developed and trained exclusively on synthetic data to predict trapping parameters directly from experimental data. The model comprises two multi-layer, fully connected, feed-forward NNs trained with backpropagation. The first network (classification model) predicts the number of distinct trap types. The second network (regression model) then predicts the corresponding trap densities and binding energies. The NN architectures, hyperparameters, and data pre-processing were optimised to minimise the amount of training data. The proposed model demonstrated strong predictive capabilities when applied to three tempered martensitic steels of different compositions. The code developed is freely provided.


Predicting mutational effects on protein binding from folding energy

Deng, Arthur, Householder, Karsten, Wu, Fang, Thrun, Sebastian, Garcia, K. Christopher, Trippe, Brian

arXiv.org Artificial Intelligence

Accurate estimation of mutational effects on protein-protein binding energies is an open problem with applications in structural biology and therapeutic design. Several deep learning predictors for this task have been proposed, but, presumably due to the scarcity of binding data, these methods underperform computationally expensive estimates based on empirical force fields. In response, we propose a transfer-learning approach that leverages advances in protein sequence modeling and folding stability prediction for this task. The key idea is to parameterize the binding energy as the difference between the folding energy of the protein complex and the sum of the folding energies of its binding partners. We show that using a pre-trained inverse-folding model as a proxy for folding energy provides strong zero-shot performance, and can be fine-tuned with (1) copious folding energy measurements and (2) more limited binding energy measurements. The resulting predictor, StaB-ddG, is the first deep learning predictor to match the accuracy of the state-of-the-art empirical force-field method FoldX, while offering an over 1,000x speed-up.


AbFlowNet: Optimizing Antibody-Antigen Binding Energy via Diffusion-GFlowNet Fusion

Abir, Abrar Rahman, Shahgir, Haz Sameen, Ratul, Md Rownok Zahan, Tahmid, Md Toki, Steeg, Greg Ver, Dong, Yue

arXiv.org Artificial Intelligence

Complementarity Determining Regions (CDRs) are critical segments of an antibody that facilitate binding to specific antigens. Current computational methods for CDR design utilize reconstruction losses and do not jointly optimize binding energy, a crucial metric for antibody efficacy. Rather, binding energy optimization is done through computationally expensive Online Reinforcement Learning (RL) pipelines rely heavily on unreliable binding energy estimators. In this paper, we propose AbFlowNet, a novel generative framework that integrates GFlowNet with Diffusion models. By framing each diffusion step as a state in the GFlowNet framework, AbFlowNet jointly optimizes standard diffusion losses and binding energy by directly incorporating energy signals into the training process, thereby unifying diffusion and reward optimization in a single procedure. Experimental results show that AbFlowNet outperforms the base diffusion model by 3.06% in amino acid recovery, 20.40% in geometric reconstruction (RMSD), and 3.60% in binding energy improvement ratio. ABFlowNet also decreases Top-1 total energy and binding energy errors by 24.8% and 38.1% without pseudo-labeling the test dataset or using computationally expensive online RL regimes.


In Silico Pharmacokinetic and Molecular Docking Studies of Natural Plants against Essential Protein KRAS for Treatment of Pancreatic Cancer

Kappan, Marsha Mariya, George, Joby

arXiv.org Artificial Intelligence

A kind of pancreatic cancer called Pancreatic Ductal Adenocarcinoma (PDAC) is anticipated to be one of the main causes of mortality during past years. Evidence from several researches supported the concept that the oncogenic KRAS (Ki-ras2 Kirsten rat sarcoma viral oncogene) mutation is the major cause of pancreatic cancer. KRAS acts as an on-off switch that promotes cell growth. But when the KRAS gene is mutated, it will be in one position, allowing the cell growth uncontrollably. This uncontrollable multiplication of cells causes cancer growth. Therefore, KRAS was selected as the target protein in the study. Fifty plant-derived compounds are selected for the study. To determine whether the examined drugs could bind to the KRAS complex's binding pocket, molecular docking was performed. Computational analyses were used to assess the possible ability of tested substances to pass the Blood Brain Barrier (BBB). To predict the bioactivity of ligands a machine learning model was created. Five machine learning models were created and have chosen the best one among them for analyzing the bioactivity of each ligand. From the fifty plant-derived compounds the compounds with the least binding energies are selected. Then bioactivity of these six compounds is analyzed using Random Forest Regression model. Adsorption, Distribution, Metabolism, Excretion (ADME) properties of compounds are analyzed. The results showed that borneol has powerful effects and acts as a promising agent for the treatment of pancreatic cancer. This suggests that borneol found in plants like mint, ginger, rosemary, etc., is a successful compound for the treatment of pancreatic cancer.